AITopics | return landscape

6191ab7080c840f67eaf5dff7d5edfcb-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 16:10:47 GMT

Diversity in equally-performing policies.We show that different neighborhoods correspond to different post-update return distributions and agent behaviors. We discover that at equal average returns, different policies obtained by the same deep RL algorithm may in fact have substantially different distributional profiles, as measured by statistics of the post-update return distribution.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana (0.04)
North America > Canada > Quebec (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)

Add feedback

6191ab7080c840f67eaf5dff7d5edfcb-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 16:10:44 GMT

machine learning, post-update return distribution, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Neural Information Processing SystemsDec-25-2025, 15:53:32 GMT

Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.

noisy neighborhood, policy optimization, return landscape, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)

Add feedback

6191ab7080c840f67eaf5dff7d5edfcb-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 19:14:12 GMT

machine learning, post-update return distribution, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

6191ab7080c840f67eaf5dff7d5edfcb-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 19:14:08 GMT

machine learning, post-update return distribution, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Neural Information Processing SystemsJan-18-2025, 19:40:19 GMT

Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy.

continuous control, noisy neighborhood, policy optimization, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Rahn, Nate, D'Oro, Pierluca, Wiltzer, Harley, Bacon, Pierre-Luc, Bellemare, Marc G.

arXiv.org Artificial IntelligenceOct-25-2023

Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.

continuous control, noisy neighborhood, policy optimization, (1 more...)

arXiv.org Artificial Intelligence

2309.14597

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

Collaborating Authors

return landscape

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

6191ab7080c840f67eaf5dff7d5edfcb-Supplemental-Conference.pdf

6191ab7080c840f67eaf5dff7d5edfcb-Paper-Conference.pdf

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

6191ab7080c840f67eaf5dff7d5edfcb-Supplemental-Conference.pdf

6191ab7080c840f67eaf5dff7d5edfcb-Paper-Conference.pdf

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control